A Statistical Framework for Testing Functional Categories in Microarray Data
نویسندگان
چکیده
Ready access to emerging databases of gene annotation and functional pathways has shifted assessments of differential expression in DNA microarray studies from single genes to groups of genes with shared biological function. This paper takes a critical look at existing methods for assessing the differential expression of a group of genes (functional category), and provides some suggestions for improved performance. We begin by presenting a general framework, in which the set of genes in a functional category is compared to the complementary set of genes on the array. The framework includes tests for overrepresentation of a category within a list of significant genes, and methods that consider continuous measures of differential expression. Existing tests are divided into two classes. Class 1 tests assume gene-specific measures of differential expression are independent, despite overwhelming evidence of positive correlation. Analytic and simulated results are presented that demonstrate Class 1 tests are strongly anti-conservative in practice. Class 2 tests account for gene correlation, typically through array permutation that by construction has proper Type I error control for the induced null. However, both Class 1 and Class 2 tests use a null hypothesis that all genes have the same degree of differential expression. We introduce a more sensible and general (Class 3) null under which the profile of differential expression is the same within the category and complement. Under this broader null, Class 2 tests are shown to be conservative. We propose standard bootstrap methods for testing against the Class 3 null and demonstrate they provide valid Type I error control and more power than array permutation in simulated datasets and real microarray experiments.
منابع مشابه
The False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملExtracellular exosomes and preeclampsia: a microarray-based study and functional enrichment analysis
Background: Preeclampsia (PE) is a heterogeneous pregnancy disease which the exact pathophysiology of it is unknown. Recently exosomes have been indicated as a causative factor in the pathogenesis of PE. The aim of the study was to investigate in microarray library data to extract the differentially expressed genes (DEGs) in PE and to perform a functional enrichment analysis to predict the rol...
متن کاملgoCluster integrates statistical analysis and functional interpretation of microarray expression data
MOTIVATION Several tools that facilitate the interpretation of transcriptional profiles using gene annotation data are available but most of them combine a particular statistical analysis strategy with functional information. goCluster extends this concept by providing a modular framework that facilitates integration of statistical and functional microarray data analysis with data interpretatio...
متن کاملIntegrated analysis of microarray data and gene function information.
Microarray data should be interpreted in the context of existing biological knowledge. Here we present integrated analysis of microarray data and gene function classification data using homogeneity analysis. Homogeneity analysis is a graphical multivariate statistical method for analyzing categorical data. It converts categorical data into graphical display. By simultaneously quantifying the mi...
متن کاملFisher’s Linear Discriminant Analysis for Weather Data by reproducing kernel Hilbert spaces framework
Recently with science and technology development, data with functional nature are easy to collect. Hence, statistical analysis of such data is of great importance. Similar to multivariate analysis, linear combinations of random variables have a key role in functional analysis. The role of Theory of Reproducing Kernel Hilbert Spaces is very important in this content. In this paper we study a gen...
متن کامل